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Abstract 

We demonstrate that the lowest possible price change (tick-size) has a large impact on the structure of financial return 
distributions. It induces a microstructure as well as it can alter the tail behavior On small return intervals, the tick-size 
can distort the calculation of correlations. This especially occurs on small return intervals and thus contributes to the 
decay of the correlation coefficient towards smaller return intervals (Epps effect). We study this behavior within a 
model and identify the effect in market data. Furthermore, we present a method to compensate this purely statistical 
error. 

Keywords: Financial correlations, Epps effect. Market emergence, Covariance estimation. Tick-size, Market 
microstructure 



1. Introduction 

The lowest possible price change of a financial security, the so called tick-size or minimum tick, plays an important 
role in quantitative finance. All raw price information is discretized by the tick-size. Historically, the tick-size of 
most securities has been consecutively reduced resulting in tick-sizes of i/iooth. This process is often referred to as 
decimalization. One reason for it was to aim at an enhanced market efficiency. In principle, small tick-sizes allow for 
a faster clearing of market arbritage. Nonetheless, it is controversial whether a smaller tick-size generally improves 
the market quality CI] |2] |3] ID, e.g., in view of the fact that a larger tick-size ensures liquidity \5\. Furthermore, a 
recent study indicates that in some cases only a fraction of the theoretically possible prices are used. Hence, prices 
cluster at certain multiples of the tick-size resulting in an effective tick-size {E\. 

However, a large tick-size can lead to erroneous data in financial indices due to rounding errors [TJ. The actual 
tick-size for stocks is typically $0.01. This is the case for instance on the New York Stock Exchange (NYSE) and 
the National Association of Securities Dealers Automated Quotations (NASDAQ). However, some securities such as 
U.S. Government Securities are still quoted in y32nds of a dollar 

The tick-size certainly afffects many fields in quantitative finance. In this study we want to focus on its impact 
on two of the most important observables: relative price changes (financial returns) and financial correlations. These 
elementary values are of particular importance for many applications, for example portfolio optimization |i8j,9J and 
risk management ifTOl . 

The article is organized as follows. In section|2j we will study the influence of the tick-size on the microstructure of 
financial return distributions. The impact of the tick-size on the calculation of financial correlations will be discussed 
in section |3] The decay of the correlation coefficients towards small return intervals is of particular interest. This 
behavior is commonly referred to as the Epps effect tlllll2l . The identified mechanism is solely caused by the discrete 
tick-size and therefore represents a statistical effect. Hence, we are able to develop a method for compensating this 
distortion. We summarize the results in sectionlH 



•Corresponding author. Tel.: +49 203 379 4727; Fax: +49 203 379 4732. 
Email address: michaeiamueiinix . com (Michael C. Miinnix) 



Preprint submitted to Physica A 



July 13, 2010 



2. Financial returns 

Observations on financial data on very small time scales are usually referred to as market microstructure fT3\. In 
this study, we will first investigate the influence of the tick-size on the shape and the microstructure of the financial 
return distribution. For this purpose, we decompose the set of returns according to the absolute price changes and 
disclose its microstructure. 

Subsequently we will demonstrate that this microstructure can alter the tail behavior of the return distribution 
compared to the underlying price change distribution. Accordingly, we will disclose a relation between the tail 
behavior of each microstructure return distribution and the overall return distribution. 



2.1. Return microstructure 

A financial return describes the relative price change of a security between two points in time. The arithmetic 
return is defined as 

ASA,(t) 

'"^'^ ^ ^(tT ' 

where S (f) denotes the price at time t and 

ASAr(t)^S(t + At)-S(t) (2) 



is the (absolute) price change within the interval [f, f + Af]. 

As the price change AS' a/ can only take values that are multiples of the tick-size q, its histogram consists of equally 



spaced peaks as shown in Fig. 1(a) In other words, the distribution of AS a, is discretized. At first glance, it is con- 



ceivable that the transition from absolute price changes AS At to relative price changes r^t removes this discretization 



from the distribution, since the returns are almost continuously distributed, as Fig. 1(b) illustrates. However, a closer 
look at the center of the distribution in Fig. |l(c)| reveals that the discretization effects are still visible. Despite its 
non-visibility, the discretization affects returns on any interval. We will discuss this point more detailed in section[3] 
For an analytical description of this discretization, we introduce the set of all returns 

^A, = 1^1^ I A5 A,(0 e [N-q, (N- + l)q, ...,(N^- l)q, N^q]] , (3) 



S{t) 

where N-q defines the lower and N+q the upper bound of the price change distribution that is discretized by the 
tick-size q. 

The set of all returns R can be separated into subsets for each price change AS ai. 



n=N- 



with 



<h{f^\^SAt(t)^nq\. (5) 



S in the denominator refers to the subset of starting prices that increase (or decrease) by nq^ in the interval Af . 

(n) 
At 



Therefore, R^"^ represents the returns that are based on the price change nq. Evidently, R^"l is bounded by 



^^(^tP--Z^. ' max(</)=— (6) 



,(„K_^L , max(<>)=^, 
max(5(«)) ^' min(5("') 

In our study, empirical data from the TAQ database lfT4l of the New York Stock exchange (NYSE) indicate that the 
approximations max(5'^"^) ~ max(5) and min(5^"^) * min(5) are legitimate for small \n\. 
Therefore, the interval between minimum and maximum return on a specific price change 

/(7?("^) = [min [r'"^) , max [r^"^)] (7) 
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Figure 1: Distribution of 1-minute returns and price changes from tlie Apollo Group Inc. (APOL) share in the first half of 2007. While Fig. (a) 
shows the distribution of the corresponding price changes, Fig. (b) shows the full distribution. Fig. (c) shows only the center of the distribution 
with the calculated bounds of corresponding to a specific price change indicated by the blue regions. Darker shades of blue imply overlapping 
bounds. 



increases with \n\, while the distance d between their centers remains almost constant 

y(nK _qs I 1 1 \ . , / 1 1 



''^^ ^ " 2 imin(5W) max(5 W) j * 2 imin(5) max(5 ) j " ® 
Thus, the intervals I(R^"^) are increasingly overlapping for larger \n\. From this viewpoint the discretization is only 



"visible" for small \n\, that is, for small price changes. Fig. 1(c) illustrates the clustering of returns with an example 
where we compare the returns of the Apollo Group Inc. (APOL) share with the intervals /(/?*"') calculated by equations 
(|7]l and (|8]l. The calculated boundaries match with the empirical data. 

2.2. Tail behavior of return and price change distribution 

We will now investigate, how the composition of the returns changes the shape of their distribution compared to 
the distribution of price changes. In the framework of a model, we generate price changes that are, in a first scenario, 
Gaussian distributed and, in a second scenario, powerlaw distributed with a given tick-size. Afterwards, we calculate 
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AS. g AS. g AS. g 

(a) gaussian A5, Smax/^min = 11 (b) gaussian A5, Sm^^/Smin = 1-5 (c) gaussian AS, 5n,ax/Smin = 2.0 




AS. g AS. g AS. g 

(d) powerlaw AS, Smax/Smin = 11 (e) powerlaw AS, Smax/Smin = 1-5 (f) powerlaw AS, Sn,ax/5min = 2.0 

Figure 2: Comparision of the distributions of normalized price changes AS and normalized returns g on different price ranges Smax/Smin- Figs, (a), 
(b) and (c) have been calculated using Gaussian distributed price changes. Figs, (d), (e) and (f) have been calculated using exponential distributed 
prices. All calculations were preformed using a standard deviation of 60 tick-sizes. 



returns using uniformly distributed price values within the regions ^min and 5 max (analogously to Figs 1(c) and l(b)| i 



In this manner, we generate a discrete price change distribution with a specific shape and then divide each set of equal 
price changes by uniformly distributed prices. The price distributions are generated individually for each subset. 

To compare the shape of the obtained return distribution with the shape which we have chosen for the price change 
distribution, we normalize the distributions to zero mean and unit variance 

where (...) denotes the mean value of a time series with length T and where cr refers to the standard deviation of the 
same time series. The index / corresponds to the a certain security, e.g., a stock. 

The results of this simple setup indicate that neither the tick-size nor the width of the price change distribution or 
the absolute sizes of S min and S max have an effect on the shape of the obtained return distribution. Only the microstruc- 
ture of its center is affected, as discussed in the previous section. In general, the return distribution acquires stronger 
tails compared to the price change distribution. Surprisingly, the shape-change of the distribution only depends on the 
ratio of the minimum and maximum price. 

Figure[2]shows the corresponding distributions for Gaussian and powerlaw distributed prices and for various price 
ranges. It turns out that the influence on the tail behavior is much stronger for a Gaussian price change distribution. 
For a powerlaw price change distribution, the return distribution retains almost the same powerlaw shape, except for 
the tails far out, while their center becomes slightly sharper 

Of course, the assumption of uniformly distributed prices on each price change is a rough approximation within 
this simple setup. In the market, there can be a strong relation between (aS and S , which leads to a shape retaining 
of the price change distribution to the return distribution. This is because the prices which undergo a very large price 
change during the interval Af can be much more sparsely distributed than prices which change only slightly. Further- 
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(a) At = 5 min (b) Ar = 1 d 

Figure 3: Change in tlie tail beliavior of the distribution of normalized returns g compared to the underlying normalized price changes AS. The 
price changes and returns have been calculated for an ensemble of 50 stocks from the S&P 500 index using return intervals of 5 minutes and 1 day. 
The stocks were chosen to provide the highest relation between price variance and mean price. 



more, the price range is usually not very high in a period of time, in which the price distribution is approximately 
uniform. In view of this and under the assumption of powerlaw distributed price changes, the situation in Figs. |2(d)| 
and 2(e) may describe most stocks suitably. Put differently, the shape of the return distribution is almost retaining the 
shape of the price change distribution in most cases. 

However, if the price of a stock covers a large range in a relatively short period of time, we actually can observe a 
change in the tail behavior. This is illustrated in Fig. [3]for an ensemble of 50 stocks taken from the S&P 500 index 
(See Tab. |A.l| l. The stocks have been chosen to provide the highest ratio between their mean price and its standard 
deviation. Although the stock ensemble shows the expected behavior, it is difficult to make an accurate statement 
regarding the tails far out, as these events are very rare, even within this statistical ensemble. 



2.3. Tail behavior of the return microstructure 

Another question arises in this regard: If the return distribution is heavy tailed, how is this connected to the tail 
behavior of the subsets of returns? As we demonstrated in the previous section, the set of all returns can be divided 
into subsets that are corresponding to a certain price change. Now, do these subparts feature stronger or weaker tails 
than the complete return distribution? 

In Fig. [4j we compare the kurtosis of the return subsets normalized to the kurtosis of the overall return distribution. 
As it is difficult to perform a proper normalization of a stock ensemble in this graphical representation, we show the 
result for the Google Inc. (GOOG) share as an example. 

We make two observations: First, there seems to be a connection between the tail behavior and the return interval 
for the return subset distributions. The return subset distributions feature stronger tails for smaller price changes. Sec- 
ond, surprisingly the return subset distributions exhibit a much smaller kurtosis than the complete return distribution. 
The strong tails of the complete distribution develop not until combining all the return subset distributions. 



3. Financial correlations 

We now turn to the impact of the tick-size on the calculation of correlations and analyze the influence on the 
decay of correlation coefficients towards smaller return intervals (Epps effect). Financial correlations are an important 
measure in economics. The knowledge of precise correlations is essential for quantifying and minimizing financial 
risk. As we will show, the discreteness of stock quotes can distort the calculated correlation coefficients. 
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Figure 4: Comparison of the kurtosis of the return distributions on a specific price change AS compared to the complete return distribution. The 
negative peak at AS' = originates from the fact that this return subset "distribution" of AS' /S only contains returns with the value zero and 
therefore leads to a value of -3/kurt(A5/S). 



A financial return is a compound observable value. Due to that fact, we develop the compensation method step 



by step. We start in section 3.1 where we turn to the distortion of the correlation coefficient of value-discretized 
time series in general. We develop a compensation for the discretization error in the correlation between financial 



(absolute) price changes in section 3.2 In section 3.3 we extend this formalism to financial returns. 

It is a basic assumption in our model that we can statistically describe the discreteness in market prices by a 
discretization of a hypothetical underlying continuous price. This is not to say that market prices actually result from 
a discretization process. Individual traders are well aware of the finite tick size and may try to exploit it in their trading 
strategies. However, there is a large variety of trading strategies simultaneously acting on the market. These strategies 
involve a large scale of different investment horizons. Since the price formation results from the interaction of a large 
diversity of strategies, the price fluctuations on the level of the tick size can be viewed as purely statistical. This is the 
basis for our modeling ansatz. 

Despite the interpolation of the price change distribution, neither parameter fixing nor calibration of the model is 
necessary, in contrast to many other compensation methods for the Epps effect 1 15..16..17i[T8l[T9ll20ll . 

3.1. Calculation of the correlation coefficient for value-discretized time series 

Almost any time series of data is discretized. This can simply be caused by numerical reasons, such as a finite 
number of decimal places. But how can we measure the impact of the discretization or even compensate it? We 
will show, that this can simply be achieved by a decomposition of the correlation coefficient and a estimation of the 
average discretization errors. 

Let xi and xi be two time series which are correlated. The correlation coefficient of xi and X2 is given by 



corr(xi,X2) = 



{X1X2} - {Xl){X2) 
0-10-2 



(10) 



Now we consider the time series xi and X2 which are the discretized values of xi and X2 with tick-sizes qi and q2, 
respectively. Thus we have 



xi(0 

X2{t) 



xi(f)-n?^'^(f) 

X2(f) + d-^^\t) 



(11) 

(12) 



where i?*''(f) and d-^^^t) are the discretization errors. We assume the discretization errors as uniformly distributed 
in the intervals ] - qi/2,qi/2] and ] - q2/2,q2/2]. This seems natural, as a discretization is commonly caused by a 
rounding process. 
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Using equations ( 1 1 



and ( [T2] i we can write the correlation coefficient (10 1 as 



corr(jici, ;iC2) - 



{(M + + - (<xi) + (^"^))(<X2) + (#^))) 

Vvar(xi + Vvar(i2 + 
cov (xi , X2) + cov (ii , + cov (x2, 1?*") + cov 
Vvar (^1 ) + var (i?(i>) + 2cov (xi , -y/var (X2) + var (i?*^)) + 2cov (^2, 1?^^') 



(13) 
(14) 



Apart from the terms cov (xi , X2), var (xi) and var (X2) of expression ( [T4| , which can be calculated with the discretized 
data, all other terms are lost in the discretization process. However, these terms can be estimated when the distributions 
gx; and pj, of xi and X2 are known, as we will demonstrate. The continuous distributions p and can be obtained 
by interpolating the distributions of the discretized values (we assume these distributions in the following context to 
be normalized). Sometimes, the shape of the distribution for a certain process is known (e.g. Gaussian). Therefore, 
the interpolated distribution function can be determined by a fit of the distributions of xi and X2. 

If the shape of the distribution is unknown, an interpolation can be performed section by section using e.g. poly- 
nomial or Unear fits. The fitting processes cannot be performed as typically by minimizing the difference of values 
from the discrete distribution and the desired fit function. Rather the discretization process needs to be included. This 
gains particular importance when the level of discretization is high and thus the distribution is discretized only with a 
small range of values. 

As the value that has been discretized to e.g. x'j can originate from region x'j - qi/2 to x'j + qi/2, the difference 
function /, which provides a measure for the residual between the fit and the empirical data is then given by 



9l(n+j) 



j) 



(15) 



for xi and analogously for X2. 

To compensate the overall discretization error, we first introduce the discretization errors that led to a certain 
discretized value. We call these errors conditional discretization errors. They are defined as 
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\X2{t) 




= XI (f) - 


nqi , 
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= Xlit) - 


mq2 


t elt 


1^2(0 



1} 

(18) 
(19) 

Here, and are the discretization errors that resulted in a discrete value of x\ - nq and X2 - mq accordingly, 
where n and m are integers. Consequently, 1?^',', and i?*^^„ are discretization errors that led to a value of xi = nq and 
X2 = mq, while the other (correlated) time series was simultaneously discretized to X2 = mq and xi - nq. In all cases, 
f is the set of time points at which these actual discretizations occur 

Using the interpolated distribution functions g^^(x{t)) and Q_tt{y(t)) and the interpolated joint distribution function 
Qxi,x2(x{t),y{t)), the average discretization errors can be calculated as 



■ ?i 
' 2 
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I ,,,, r'n{m+\) r^2im+{) 

(^^1?) = (z-mq2)g.,(z)dzl Q.Az)dz 

r9i(«+5) r9i(«+5) 

('?,m) = (z-nqx)Qx,,y^{z,mq2)dz Qx^,x2{z,niq2) dz 

I (z-mq2)Qxi,x2(nqi,z)dzl I Qx^,x2(nqi,z)dz , 



where 



+00 

—CO 

+ CXD 



Therefore the overall average discretization errors can be written as 

T 



T 

n=N- m=M- 



(20) 
(21) 
(22) 
(23) 



{xi{f),z)dz = fo,(xi(0) and (24) 



(z,X2(f))* = fo,(-J:2(0)- (25) 



n=N- 

('^''0 - ^Z?^™^'')' (27) 

n7-M_ 

where T„ and r,„ are the number of values that have been discretized to nq\ and mq2. 
Now we can calculate the discretization terms of equation ( 14 1. We begin with: 

cov(li,i?<2)) = (28) 
- Z Zhi<^'(0)-(^i>('?^^') (29) 

n=A^_ m=M- t=0 



J]n J] T„,„{&^^l)-{W{&^'^) . (30) 



Here, qiN- represents the minimum of the discretized time series xi(t). qiN+ is its maximum. T is the length of the 
whole time-series, while r„ ,„ is the number of synchronous pairs of both time series, which are discretized to nq\ and 
mq2. We index these pairs with f referring to these certain point in time. 
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Analogously, the other discretization terms of equation ( 14 1 can be calculated as 



cov(i,,,?">) = il^r„«(^('))-<i,>i^r„(<)) (32) 

n=N- n=N- 

cov(i2,#2)) = I £ r„m(^(p)-<X2)i £ r„,(,?P)^ (33) 



m=M- m=M- 



(34) 



var 



12 

K>) - I (35) 



The terms ( 34 1 and ( 35 i are estimated under the assumption that the discretization errors are uniformly distributed. 
Usually, the remaining term cov()?J,'\ j?^'') cannot be calculated with the distribution functions as it contains the cor- 
relation between the discretization errors. This value is not necessarily connected to the correlation of the whole time 
series either. Yet, we will show in the next section, that this term is negligible in the present context. 

Thus, we have shown that the error caused by the discretization can be estimated by decomposing the correlation 
coefficient and approximating the mean discretization errors by interpolating the discrete distributions. 

3.2. Distortion of price change correlations 

We now turn to the specific situation on the stock market. The situation differs, when applying the method from the 
previous section to stock price changes. Here, the discretization process does not take place on the actual observable. 
Instead the price change AS is a difference for two prices S it) and Sit + At) that are discretized by the tick-size q. 

Therefore, the discretization error on a specific price difference AS ' can be in the range from -q to q. However, 
the probability that a certain value is from a price difference within this range is not constant. It is described by 



a triangular-shaped distribution (See Fig. 5(a) i. This is evident, as the distribution error is the difference of two 
uniformly distributed discretization errors. The normalized triangular distribution gxri around a certain price change 
AS ' vanishes at A5" - ^ and AS' + q and has the value 1 at its maximum at A5" . It reads as 



£)Tri(jC, A5') 



^^v-^ (A5'-^)<x<A5' 
=^±^ (AS' +q)>x>AS' (36) 
else . 



The average discretization errors have now to be calculated with the product of the triangular distribution gixri and the 
interpolated price change distributions pas,, Pas, (and proper normalization). Thus, 

yv) = I iz-nqi)QASi(z)gTn{z,nqi)dzl I gAs,{z)gjn(z,nqi)dz , (37) 

\^n,m) = I iz-nqi)QASi,AS2(z,mq2)gTriiz,nqi)dzl \ QAs,.hS2iz^fnq2)Qju{z,nq\)dz . (38) 



l^'&m^^ a nd (i^ m n) are analogously defined. 

Fig. |5(b) |shows exemplarily the product of a triangular distribution and a power law distribution. The denominator 
in equation (|37[) refers to the area under this curve. The triangular distribution also needs to be included in the fitting 
process. Thus, the difference function becomes 
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Figure 5: Exemplary distribution of discretization errors around a price cliange of AS =0.1 and a tick-size of i^^5 = 0.01 (a). Fig. (b) shows the 
product with power law distribution given by pAs ( v) = 10a:"* 



fAs(eAS,eAs) 



z 
z 



qs(n+l) 



J Qjiiiz, nqs ) [qks iz) - Q^s (nqs )] dz 



-9.s(n-l) 
?s(n+l) 



J gjriiz, 



nqs )Qas (z) dz - g^s (nqs ) 



.9s(«-l) 



(39) 



(40) 



Where g^^ refers to the discretized distribution, gj^i acts like a weighting function in the residual measure. It provides 
a weight corresponding to the probability that the difference of the originating discretization errors result in the value 
z- 



Now, we are able to estimate the correlation discretization error with the previously defined equations ( 30 1 to ( 35 i 



3.3. Distortion of return correlations 

When calculating the correlation of financial returns as defined in equation ([T]) the situation becomes more com- 
plex. Here, we also have to take the prices into account. The correlation coefficient ( [l4| for two return time series ri 
and r2 now reads as 



corr(ri,r2) 



coHfu-n) ^ cov , f ) ^ cov( t , f ) . cov(f , f ) 
7var(n) + var(f ) + 2cov (f, f ) ^.^iW + var(f ) + 2cov(fi, f ) 



(41) 



Here, Fj and r2 refer to the discretized return time-series. Analogously to the correlation between price changes, the 
individual terms can be estimated, but in addition, the starting prices S \ and 5 2 need to be parameterized. We use the 
variables k and / for this. q\K- represents the minimum price within the observed time series, while q\K+ represents 
the maximum price. T„^m^iij represents the number of pairs whose returns equal {q\n)l{q\k) - n/k and m/l. Similar to 
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that, T„^k refers to the number of returns (from a single time-series) that are equal ton Ik. Thus, we obtain 



GOV 



GOV 



^'^) ^ r Z-Z'^'SS^-a.^-^ ^ (43) 



m=M_ «=A'_ /=L- 



Sij * 6 



^2 / ' 6 \52 



(46) 



The terms ^i?^'*/'^ i) and analogously ^t^'^V'^a) in equations (|42| to (|45| ean be estimated as 

We note that the correlation between AS and 5 is neglected in this approximation. Also the discretization of the prices 
in the denominator of the return is not compensated. However, the model results in the next section demonstrate that 
this simplification only induces a minor error 

Also the impact of specific trading strategies can be calculated using the presented modeling. Here, the distortion 
of correlation coefficients, the distribution of discretization errors (equation ( 36 1) needs to be chosen in a suitable 
manner. 



3.4. Results / Impact on the Epps effect 

After we developed a method for compensating the discretization error in the calculation of correlations, we verify 
it in a model setup and apply it to empirical data. We perform the presented compensation for different time intervals, 
in order to examine wether there is also a connection to the Epps effect. 

The Epps effect refers to the decay of the correlation coefficient towards small return intervals. Therefore financial 
correlations on returns which are based on intervals below a certain limit (e.g. 30 minutes) are unreliable. The ability 
to calculate the correlation structure on small return intervals is equivalent to an improved statistical significance or 
the gain of more recent information. In previous studies the asynchrony of the time series has been identified as a 
major cause for the Epps effect 11211 l22l . The following demonstrates that the price discretization can result in a sizable 
contribution to the Epps effect as well. 

As the mean price change per return interval decreases with the length of the interval [23 ] , the width of the price 
change distribution decreases as well. While the tick-size remains constant, the discretization error increases. Hence, 
the tick-size should also have an impact on the Epps effect - especially for stocks which are traded at low prices. 

3.4.1. Model results 

Before applying the method to estimate the discretization error in empirical data, we evaluate it in a model setup. 
In addition, we will use the model to analyze the impact of each term from the decomposed correlation coefficient on 
the compensation. 



11 



0.4 



0.3 



n r 



cov ( ASi , t? P) ) , cov (AS2 , i> ( 1 ) ) 
cov 

var(i?(l)),var(i?P)) 
cov(AS2,t?(l)), cov(AS2,i?P)) 



-0.1 



_] L. 



_] L. 



0.5 
0.4 
0.3 
0.2 
0.1 
0.0 
-0.1 
-0.2 
-0.3 



COV (ASi , 1? (2) ) _ cov (A52 , d ( ^ ) ) 
cov 

var(i?(l)),var(i>(2)) 

cov ( AS 1 , ( 1 ) ) , cov ( AS2 , 1? ) 




A?/60 



A?/60 



(a) c = 0.2, 5<i'o = 1000, S^^^ = 1000 



(b) c = 0.4, S% = 1000, S^J, = 1000 



-1 r- 



_. cov(A5i,d(2)),cov(A52,ij(^)) 

cov()j('),dP)) 

var(i>(l)),var(!>(2)) 

cov(ASi,i?(l)), cov(AS2,i>(^)) 



/ 



-0.5 -/ 



_J L. 



_J L. 



O 
Oh 



0.10 

0.08 
0.06 
0.04 
0.02 
0.00 
-0.02 
-0.04 
-0.06 
-0.08 



cov(A5i,i?(2)),cov(AS2,t>(^') 

cov(ij(l),d(2)) 

var(i?(l)),var(ij(2)) 

cov(ASi,j?(l)),cov(AS2,!?(2)) 



5 10 15 20 25 




A?/60 



A?/60 



(c) c = 0.8, 5 



(1) 



1000, 5 



(2) 



1000 



(d) c = 0.2, S 



(1) 



1000, s 



(2) 



10000 



0.5 
0.4 

0.3 
0.2 
0.1 

0.0 H ■■■ 

-0.1 
-0.2 



_. cov(A5i,!?(2)) c(,v(A52,i»(^)) 
cov 

var(i>(l)),var(t?(2)) 

cov(ASi,i>(l)),cov(AS2,i>(^)) 



cov(ASi , i>(2) cov(AS2, t>(l) ) 

cov(i?(l),i?(2)) 

var(t?(l)),var(r?(2)) 

cov ( ASj , J? ( 1 ) ) , cov ( AS2 , t> ) 



25 



30 




A?/60 



A?/60 



(e) c = 0.4, 5™ = 1000, sf}^ = 10000 



(f) c = 0.8, S% = 1000, sJ^'q = 10000 
Figure 6: Impact of each term of the compensation method for the correlation coefficient between price changes. 
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Figure 7: Benchmark of the error estimation: Comparison between real and estimated discretization errors within the model setup. 



We begin with generating an underlying correlated time series using the Capital Asset Pricing Model (CAPM) 
B4ll . which is in a one-factor form known as Noh's model 1,25 J in physics, 



r 



(0 



(f) = 77(f) + Vl -ce<'>(f) . (49) 



Here /'^ stands for the return of the /-th stock at time t and c is the correlation coefficient. The random variables rj 
and e^'^ are taken from standard normal distributions. Two return time series and r*^' are generated representing 
two correlated stocks. The lengths of these time series is chosen as 7.2 ■ 10*, corresponding to a return interval Af of 
1 second during 1 trading year 

Using these returns, we generate two price time series 5*'' and 5^^* that perform a geometric Brownian motion 
with zero drift and a standard deviation of 10"^ per time step. The initial starting prices 5,=o were set to 1000 and 
10000. In the next step, we round the prices to integer values. An integer price of for example 1000 then corresponds 
to a price of 10 and a tick-size of 0.01. 

Now we are able to construct the discretized return time series r*'' from these discretized prices using return 
intervals from 60 data points (corresponding to 1 minute) to 1800 data points (corresponding to 30 minutes). 

As we know the actual discretization errors in the model, we can use it to evaluate our error estimates. A compari- 
son of the estimated average discretization errors with the actual discretization errors is shown in Fig. [T] The estimated 
values show an excellent agreement with the original values. We restrict the interpolation to a single Gaussian fit, as 
we know the type of the price change distributions in this case. Thus, we can verify the scope of the estimation itself, 
not the suitability of the interpolation. 



Before we perform the compensation, we want to see how much impact each correction term (equations ( 42 1 to 



(47l) has. We quantify the impact by calculating equation (j4T]i and subtract the value of this expression with the 



regarded term set to zero. By this method we can see how the correlation coefficient changes, if a certain term of 
the discretization compensation is neglected (set to zero). Figure |6] illustrates the results of this analysis for different 



start prices and correlation coefficients. It turns out that only equations (44 1 to (47 1 provide a sizable contribution to 



the compensation. Therefore, we restrict our compensation to the calculation of these terms. This implies that the 



distortion of the correlation coefficient is mainly caused by an improper normalization of the returns, as the terms (44 
to ( [47| only appear in the correction of the standard deviations of each return. 

Thereby, we are able to compensate the discretization effects. We first focus on the correlations between price 



changes. As shown in Fig. 8(a) and 8(b) the correlation coefficient decays towards smaller price change intervals. 
Therefore, this effect is also a cause of the Epps effect. This effect becomes especially relevant when the ratio of the 
price to the tick-size is sufficiently small. It is remarkable that this scaling behavior is observed even though the time 
series are synchronous. The effect vanishes in our simulation, when both prices start with a value of 10000, as Fig. 
[8(c)] illustrates. 
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Figure 8: Left: Scaling behavior of tlie correlation coefficient of price Figure 9: Right: Scaling behavior of the correlation coefficient of re- 
changes due to the discretization error in the model setup. The dashed turns due to the discretization error in the model setup. The dashed line 
line represents the presented correction. represents the presented correction. 
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(b) Ensemble: $10.01-$20.00 



Figure 10: Tick-size compensation of the correlation coefficient between two ensembles consisting of the 25 highest correlated stocks from the 
S&P 500 index that are averagely quoted within the region of $0.01-$10.00 and $10.01-$20.00, respectively. The correlation coefficients have been 
normalized to its saturation value at approximately 30 min. The plot the error bars represents the double standard deviation 2o". The correlation 
coefficients for a return interval of 30 minutes averages to 0. 1 9 (a) and 0.37 (b). 



When applying the compensation method to return time series as illustrated in Fig. |9] we are also able to correct the 
discretization error almost completely. The slight decay of the corrected correlation coefficient on very small return 
intervals is due to approximations, as stated at the end of section |3.3| These are the negligence of the correlation 
between price changes and prices. In addition, even though the discretization of price changes is corrected, the price 
discretization in the denominator of the return is neglected. A further improvement of the compensation could be 
achieved by including these effects. However this would require further assumptions on the price process and would 
increase the necessary computing time dramatically. Thus, we restrict ourselves to the presented compensation. 

3.4.2. Empirical results 

How large is the contribution of the discretization effect to the Epps effect? To answer this, we apply the compen- 
sation to empirical data from the NYSE TAQ database |T4l. Here, we use a powerlaw approach for the interpolation 
of the price change distribution, as the model results indicate that the discretization effects are mainly relevant for 
small return intervals. On small return intervals, powerlaw tails can describe the distribution satisfactory ll26l . We 
perform a least squares fit of a and b in £)as - ajc"'*' for each value of the (discrete) distribution and their next two left 
and right neighbors individually. For the very central part of the distribution, a Gaussian fit was performed. 

It is particularly important that stock splits must not be corrected in order to maintain the correct tick-size. Of 
course, therefore overnight returns have to be excluded. To analyze the impact of the discretization effect, we construct 
two ensembles (See Tab. |A.2| and [A.3| l of stocks from the S&P 500 index. The first ensemble consists of stocks that 
are averagely priced between $0.01 and $10.00. The second ensemble consists of stocks that are on average priced 
between $10.01 and $20.00. Both ensembles are composed of 25 stock pairs providing the highest correlation during 
the year 2007 (based on daily data). 



As figure 10 demonstrates, we are able to compensate the impact of the tick-size on the correlation coefficient 
in empirical market data. Certainly, the decay can not be corrected completely with the presented method, as the 
discretization effect superimposes with other causes of the Epps effect such as asynchronous |21 1 or lagged lfT2ll27l 
time series. However, we were able to quantify the contribution of this particular effect to the Epps effect. Our 
results show, that the discretization effect can be responsible for up to 40% of the Epps effect, which we define as the 
difference between the correlation coefficient at a given time and its saturation value. The contribution is particularly 
large for stocks that are traded at low prices. 
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4. Conclusion 

We demonstrated the impact of the tick-size on the microstracture of financial returns. This structure can lead 
to a change in the shape of the distributions of returns and price changes. If a stock exhibits a large price change 
in the observed period of time, the composition of the return distribution can lead to heavier tails. We also showed 
that the return distribution consists of return subset distributions that are more sparsely distributed than the complete 
distribution. 

Furthermore, we demonstrated that the discretization eff'ects can distort the calculation of correlation coeflicients, 
especially if the stocks are traded at low prices. We showed that the erroneous correlation coefficient is mainly caused 
by an improper normalization of the retums. This distortion depends on the impact of the discretization, which grows 
for small return intervals. Therefore the observed behavior contributes to the Epps effect. 

We developed a method to compensate these discretization effects, which we vaUdated in a model setup. The 
compensation is only based on the tick-size. Despite the interpolation of the price change distribution, the compen- 
sation is parameter-free. This method was also apphed to market data. We were able to identify and compensate the 
impact of the tick-size on the correlation coefficient. The results indicate that the discretization error makes a sizable 
contribution to the Epps effect for stocks that are traded at low prices. 
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Table A.l: Ensemble of 50 stocks from the S&P 500 index. The stocks provide the highest relation between the mean price (S) and its standard 
deviation as /<S> with at least 1000 trades per day. 



Symbol 


Name 


Stock exchange 


(S) 






Average trades 


BSC 


Bear Steams Cos. 


NYSE 


133.14 


23.90 


0.180 


25529 


sn 


Smith International 


NYSE 


57.12 


10.46 


0.183 


13949 


FRE 


Federal Home Loan Mtg. 


NYSE 


57.87 


10.80 


0.187 


21389 


LSI 


LSI Corporation 


NYSE 


7.93 


1.49 


0.187 


20497 


PCAR 


PACCAR Inc. 


NASDAQ 


74.10 


13.89 


0.187 


11376 


JCP 


Penney (J.C.) 


NYSE 


69.49 


13.08 


0.188 


15925 


CVG 


Convergys Corp. 


NYSE 


21.77 


4.16 


0.191 


5147 


CBG 


CB Richard Ellis Group 


NYSE 


31.58 


6.08 


0.193 


13992 


LIZ 


Liz Claiborne Inc. 


NYSE 


35.80 


6.97 


0.195 


6330 


EC 


Brunswick Corp. 


NYSE 


28.09 


5.54 


0.197 


5387 


CNX 


CONSOL Energy Inc. 


NYSE 


45.20 


8.99 


0.199 


12842 


AKAM 


Akamai Technologies Inc 


NASDAQ 


42.89 


8.63 


0.201 


22620 


MCO 


Moody's Corp 


NYSE 


56.88 


11.87 


0.209 


16294 


RSH 


Radio Shack Corp 


NYSE 


24.72 


5.17 


0.209 


14454 


LUK 


Leucadia National Corp. 


NYSE 


38.25 


8.05 


0.211 


3429 


FLR 


Fluor Corp. (New) 


NYSE 


115.34 


24.90 


0.216 


7413 


NCC 


National City Corp. 


NYSE 


30.54 


6.71 


0.220 


17876 


LXK 


Lexmark Inl'l Inc 


NYSE 


48.47 


10.91 


0.225 


8838 


DF 


Dean Foods 


NYSE 


32.98 


7.49 


0.227 


6647 


MBI 


MBIA Inc. 


NYSE 


58.90 


13.50 


0.229 


17571 


PCX 


Frceporl-McMoran Cp & Gld 


NYSE 


82.68 


19.07 


0.231 


45292 


FHN 


Firsi Horizon National 


NYSE 


34.09 


7.88 


0.231 


7319 


ESRX 


Express Scripts 


NASDAQ 


70.84 


16.43 


0.232 


11623 


JNPR 


Juniper Networks 


NASDAQ 


27.11 


6.35 


0.234 


33006 


DDS 


Dillard Inc. 


NYSE 


28.97 


6.79 


0.234 


8291 


CMI 


Cummins Inc. 


NYSE 


117.32 


47.60 


0.406 


9909 


JNY 


Joni;s Apparel Group 


NYSE 


26.09 


6.24 


0.239 


5803 


MON 


Monsanto Co. 


NYSE 


70.65 


16.94 


0.240 


17403 


SOV 


Sovi;rcign Bancorp 


NYSE 


20.13 


4.84 


0.240 


10837 


CMCSA 


Comcast Corp. 


NASDAQ 


27.05 


6.64 


0.246 


55284 


OMX 


OfficeMax Inc. 


NYSE 


39.68 


9.85 


0.248 


6009 


WM 


Washingion Mutual 


NYSE 


36.55 


9.09 


0.249 


39145 


KG 


King Pharmaceuticals 


NYSE 


16.29 


4.15 


0.254 


9548 


JEC 


Jacobs Engineering Group 


NYSE 


71.33 


32.20 


0.451 


5501 


CIT 


CIT Group 


NYSE 


46.43 


12.31 


0.265 


11141 


THC 


Tenci Healthcare Corp. 


NYSE 


5.66 


1.51 


0.267 


12112 


KBH 


KB Home 


NYSE 


37.21 


10.34 


0.278 


16670 


GME 


Game Si op Corp. 


NYSE 


47.04 


23.36 


0.497 


9619 


CTX 


Centex Corp. 


NYSE 


37.79 


10.53 


0.279 


14328 


CTSH 


Cognizant Technology Solutions 


NASDAQ 


72.31 


20.33 


0.281 


13112 


ODP 


Office Depot 


NYSE 


28.13 


7.92 


0.282 


14062 


NOV 


National Oilweil Varco Inc. 


NYSE 


88.30 


30.40 


0.344 


19716 


GILD 


Gilcad Sciences 


NASDAQ 


57.24 


17.77 


0.310 


26240 


ABK 


Ambae Financial Group 


NYSE 


70.98 


23.27 


0.328 


16261 


PHM 


Pultc Homes Inc. 


NYSE 


21.82 


7.34 


0.336 


15737 


LEN 


Lcnnar Coip. 


NYSE 


35.41 


11.91 


0.336 


16250 


MTG 


MGIC Investment 


NYSE 


46.61 


17.01 


0.365 


14053 


CC 


Circuit City Group 


NYSE 


13.65 


5.00 


0.366 


16660 


ETFC 


E*Trade Financial Corp. 


NASDAQ 


17.65 


6.86 


0.389 


33380 


CFC 


Countrywide Financial Corp. 


NYSE 


28.75 


11.41 


0.397 


65703 



Table A.2: Ensemble of 50 highest correlation stocks stock pairs from the S&P 500 index that are averagely traded between $0.01 and $10.00. The 
column vaTcorr refers to the variance of the correlation of a moving 30-day window. 



Symbol 


Stock 1 

Name 


Stock exchange 


Mean Price 


Symbol 


Stock 2 

Name 


Stock exchange 


Mean Price 




V arc on 


F 


Ford Motor 


NYSE 


8.16 





Qv/est Com municai ions Im 


NYSE 


8.63 


0.11 


0.02 


Q 


Qwcsl Communications inl 


NYSE 


8.63 


CPWR 


Compuware Corp. 


NASDAQ 


9.44 


0.12 


0.03 


CPWR 


Compuwarc Coip. 


NASDAQ 


9.44 


UIS 


Unisys Corp. 


NYSE 


7.65 


0.13 


0.10 


CPWR 


Compuware Corp. 


NASDAQ 


9.44 


THC 


Tenet Healthcare Corp. 


NYSE 


5.66 


0.14 


0.03 


NOVL 


Novell Inc. 


NASDAQ 


7.21 


F 


Ford Motor 


NYSE 


8,16 


0.14 


0.02 


F 


Ford Motor 


NYSE 


8.16 


LSI 


LSI Corporation 


NYSE 


7.93 


0.15 


0.03 


Q 


Qwest Communications Int 


NYSE 


8.63 


THC 


Tenet Healthcare Corp. 


NYSE 


5.66 


0.15 


0.03 


F 


Ford Motor 


NYSE 


8.16 


CPWR 


Compuware Corp. 


NASDAQ 


9.44 


0.15 


0.02 


Q 


Qwest Communications Int 


NYSE 


8.63 


LSI 


LSI Corporation 


NYSE 


7.93 


0.16 


0.03 


UIS 


Unisys Corp. 


NYSE 


7.65 


THC 


Tenet Healthcare Corp. 


NYSE 


5.66 


0.17 


0.05 


Q 


Qwest Communications Int 


NYSE 


8.63 


UIS 


Unisys Corp. 


NYSE 


7.65 


0.18 


0.05 


DYN 


Dynegy Inc. 


NYSE 


8.77 


THC 


Tenet Healthcare Corp. 


NYSE 


5.66 


0.18 


0.02 


Q 


Qwest Communications Int 


NYSE 


8.63 


DYN 


Dynegy Inc. 


NYSE 


8.77 


0.19 


0.03 


DYN 


Dynegy Inc. 


NYSE 


8.77 


LSI 


LSI Corporation 


NYSE 


7,93 


0.21 


0.03 


UIS 


Unisys Corp. 


NYSE 


7.65 


LSI 


LSI Corporation 


NYSE 


7.93 


0.22 


0.05 


LSI 


LSI Corporation 


NYSE 


7.93 


TI IC 


Tenet Healthcare Corp. 


NYSE 


5.66 


0.22 


0.02 


NOVL 


Novell Inc. 


NASDAQ 


7.21 


UIS 


Unisys Corp. 


NYSE 


7.65 


0.22 


0.04 


CPWR 


Compuware Corp. 


NASDAQ 


9.44 


LSI 


LSI Corporation 


NYSE 


7.93 


0.23 


0.04 


F 


Ford Motor 


NYSE 


8.16 


UIS 


Unisys Corp. 


NYSE 


7.65 


0.23 


0.04 


DYN 


Dynegy Inc. 


NYSE 


8.77 


CPWR 


Compuware Corp. 


NASDAQ 


9,44 


0.24 


0.03 


NOVL 


Novell Inc. 


NASDAQ 


7.21 


CPWR 


Compuware Corp. 


NASDAQ 


9,44 


0.25 


0.06 


NOVL 


Novell Inc. 


NASDAQ 


7.21 


DYN 


Dynegy Inc. 


NYSE 


8.77 


0.26 


0.04 


NOVL 


Novell Inc. 


NASDAQ 


7.21 


LSI 


LSI Corporation 


NYSE 


7.93 


0.29 


0.07 


DYN 


Dynegy Inc. 


NYSE 


8.77 


UIS 


Unisys Corp. 


NYSE 


7.65 


0.29 


0.03 


F 


Ford Motor 


NYSE 


8.16 


DYN 


Dynegy Inc. 


NYSE 


8.77 


0.39 


0.03 
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Table A.3: Ensemble of 50 highest correlation stocks stock pairs from the S&P 500 index that are averagely traded between $10.01 and $20.00. 
The column varcon- refers to the variance of the correlation of a moving 30-day window. 





Stock 1 






Stock 2 




corr 




Symbol 


Name 


Stock exchange 


Mean Price 


Symbol 


Name 


Stock exchange 


Mean Price 






CMS 


CMS Energy 


NYSE 


17.18 


CZN 


Citizens Communications 


NYSE 


14.34 


0.37 


0.05 


XRX 


Xerox Corp. 


NYSE 


17.45 


AW 


AUied Waste Industries 


NYSE 


12.71 


0.37 


0.04 


MU 


Micron Technology 


NYSE 


11.38 


TER 


Teradyne Inc. 


NYSE 


15.07 


0.37 


0.04 


IPG 


Interpublic Group 


NYSE 


11.23 


I IB AN 


Huntington Bancshares 


NASDAQ 


19.95 


0.38 


0.03 


TE 


TECO Energy 


NYSE 


16.96 


I IB AN 


Huntington Bancshares 


NASDAQ 


19.95 


0.38 


0.05 


TE 


TECO Energy 


NYSE 


16.96 


EP 


El Paso Corp. 


NYSE 


16.08 


0.38 


0.02 


AN 


AutoNation Inc. 


NYSE 


19.95 


IPG 


Interpublic Group 


NYSE 


11.23 


0.39 


0.03 


AW 


Allied Waste Industries 


NYSE 


12.71 


I IB AN 


Huntington Bancshares 


NASDAQ 


19.95 


0.39 


0.03 


LUV 


Southwest Airlines 


NYSE 


14.78 


I IB AN 


Huntington Bancshares 


NASDAQ 


19.95 


0.39 


0.03 


DUK 


DukeEnei^ 


NYSE 


19.34 


CMS 


CMS Energy 


NYSE 


17.18 


0.40 


0.03 


rosu 


JDS Uniphase Corp. 


NASDAQ 


14.72 


MU 


Micron Technology 


NYSE 


11.38 


0.40 


0.07 


AN 


AutoNation Inc. 


NYSE 


19.95 


TER 


Teradyne Inc. 


NYSE 


15.07 


0.40 


0.02 


SLE 


Sara Lee Corp. 


NYSE 


16.73 


EP 


El Paso Corp. 


NYSE 


16.08 


0.41 


0.09 


CNP 


CenterPoint Energy 


NYSE 


17.52 


EP 


El Paso Corp. 


NYSE 


16.08 


0.41 


0.03 


TSN 


Tyson Foods 


NYSE 


19.06 


ETFC 


E*Trade Financial Corp. 


NASDAQ 


17.65 


0.41 


0.05 


TER 


Teradyne Inc. 


NYSE 


15.07 


I IB AN 


Huntington Bancshares 


NASDAQ 


19.95 


0.42 


0.04 


CMS 


CMSEneigy 


NYSE 


17.18 


CNP 


CenterPoint Energy 


NYSE 


17.52 


0.43 


0.06 


HCBK 


Hudson City Bancorp 


NASDAQ 


13.85 


I IB AN 


Huntington Bancshares 


NASDAQ 


19.95 


0.44 


0.03 


DUK 


Duke Energy 


NYSE 


19.34 


WIN 


Wndstream Corporation 


NYSE 


14.26 


0.44 


0.11 


LUV 


Southwest Airlines 


NYSE 


14.78 


AN 


AutoNation Inc. 


NYSE 


19.95 


0.45 


0.03 


CMS 


CMSEneigy 


NYSE 


17.18 


EP 


El Paso Corp. 


NYSE 


16.08 


0.45 


0.03 


TE 


TECO Energy 


NYSE 


16.96 


CMS 


CMSEnei^ 


NYSE 


17.18 


0.46 


0.04 


TE 


TECO Energy 


NYSE 


16.96 


DUK 


Duke Energy 


NYSE 


19.34 


0.47 


0.05 


AN 


AutoNation Inc. 


NYSE 


19.95 


HBAN 


Huntington Bancshares 


NASDAQ 


19.95 


0.49 


0.04 


TE 


TECO Energy 


NYSE 


16.96 


CNP 


CenterPoint Energy 


NYSE 


17.52 


0.51 


0.04 
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